NicheToolBox: An example of the model selection protocol for ellipsoid models
1 The example
We demonstrate the use of the model selection protocol implemented in ntbox by retrieving occurrence information from the species Leopardus wiedii, a small cat that distributes in the Neotropics (Fig. 1). Then we use a native function of the package to calibrate and select the best models for L. wiedii.
Figure 1. Leopardus wiedii. The image is taken from (Sanchez et al. 1998)
First, we set the random seed in order to make reproducible the example
2 Get the data for the example
From ntbox, we downloaded available occurrences for L. wiedii from the Global Biodiversity Information Facility (https://www.gbif.org) and explore what is the provenance and date of collecting of these points (Fig. 2).
dAll_a <- ntbox::searh_gbif_data(genus = "Leopardus",
species = "wiedii",
occlim = 5000,
leafletplot = TRUE)We select those records starting in 1950 as we will use the bioclimatic layers from WorldClim for the modeling process.
2.1 Environmental data
Now, we use the function getData from the package raster (Hijmans 2017) to download the WorldClim at 2.5 ArcMinutes of resolution (~ 4.65 km at Ecuador)
2.2 Crop and mask environmental data
Reading a shapefile for America
amp <- normalizePath("../america_sin_islas")
am <- rgdal::readOGR(dsn =amp,layer="america_sn_islas")Cut the layers using America as a mask
2.3 Generate environmental background data
We will generate the random environmental points that will be used to estimate the Partial ROC test (see (Owens et al. 2012; Cobos et al. 2019)) of the calibrated models.
## Warning: [ONE-TIME WARNING] Forked processing ('multicore') is disabled
## in future (>= 1.13.0) when running R from RStudio, because it is
## considered unstable. Because of this, plan("multicore") will fall
## back to plan("sequential"), and plan("multiprocess") will fall back to
## plan("multisession") - not plan("multicore") as in the past. For more
## details, how to control forked processing or not, and how to silence this
## warning in future R sessions, see ?future::supportsMulticore
3 Curate data from duplicated records
First, we extract environmental information from occurrences
dAll_env <- raster::extract(am_env,dAll[,2:3])
dAll_genv <- data.frame(dAll[,c(2:3,ncol(dAll))],
dAll_env)We remove duplicated data
dAll_genv_clean <- ntbox::clean_dup(dAll_genv,
longitude ="longitude",
latitude = "latitude",
threshold = res(am_env)[1])
# remove NA's
dAll_genv_clean <- na.omit(dAll_genv_clean)See the data on a leaflet map
m <- leaflet::leaflet(dAll_genv_clean)
m <- m %>% leaflet::addTiles()
m <- m %>% leaflet::addCircleMarkers(lng = ~longitude,
lat = ~latitude,
popup = ~leaflet_info,
fillOpacity = 0.25,
radius = 7)
mRemove wired occurrences. Click on the pop-up to display gbif information (available when the downloaded data comes from search_gbif function), the points that are outside the distribution are the one on San Francisco (this comes from a collection; rowID=400), the record on Florida (rowID=244) and the point that its on latitude and longitude zero (rowID=310).
rnames <- as.numeric(rownames(dAll_genv_clean))
# Indixes of the wired data (can change depending the date of the gbif query)
to_rmIDs <- c(400,244,310)
to_rm <- which(rnames %in% to_rmIDs)
dAll_genv_clean <- dAll_genv_clean[-to_rm,]Explore the curated data
4 Split the data in train and testing
Now we will create train and testing data using a proportion of 70:30 respectively
Geographic train and test data
Environmental train and test
6 Ellipsoid calibration and selection
For calibrating Minimum Volume Ellipsoid Models (Van Aelst and Rousseeuw 2009) we will use a proportion of 0.95 of the training data for 3,5 and 6 variables. The omission rate criteria is 0.05 (5%). The models will be calibrated and selected in parallel; each job will process 100 models.
nvarstest <- c(3,4,5)
t1 <- system.time({
e_selct <- ntbox::ellipsoid_selection(env_train = dtrain_env,
env_test = dtest_env,
env_vars = env_vars,
level = 0.95,
nvarstest = nvarstest,
env_bg = env_bg0,
omr_criteria=0.05,
parallel = TRUE,
comp_each = 100,proc = TRUE)
})## -----------------------------------------------------------------------------------------
## **** Starting model selection process ****
## -----------------------------------------------------------------------------------------
##
## A total number of 455 models will be created for combinations of 15 variables taken by 3
##
## A total number of 1365 models will be created for combinations of 15 variables taken by 4
##
## A total number of 3003 models will be created for combinations of 15 variables taken by 5
##
## -----------------------------------------------------------------------------------------
## **A total number of 4823 models will be tested **
##
## -----------------------------------------------------------------------------------------
## Doing calibration from model 1 to 100 in process 1
##
## Doing calibration from model 101 to 200 in process 2
##
## Doing calibration from model 201 to 300 in process 3
##
## Doing calibration from model 301 to 400 in process 4
##
## Doing calibration from model 401 to 500 in process 5
##
## Doing calibration from model 501 to 600 in process 6
##
## Doing calibration from model 601 to 700 in process 7
##
## Doing calibration from model 701 to 800 in process 8
##
## Doing calibration from model 801 to 900 in process 9
##
## Doing calibration from model 901 to 1000 in process 10
##
## Doing calibration from model 1001 to 1100 in process 11
##
## Doing calibration from model 1101 to 1200 in process 12
##
## Doing calibration from model 1201 to 1300 in process 13
##
## Doing calibration from model 1301 to 1400 in process 14
##
## Doing calibration from model 1401 to 1500 in process 15
##
## Doing calibration from model 1501 to 1600 in process 16
##
## Doing calibration from model 1601 to 1700 in process 17
##
## Doing calibration from model 1701 to 1800 in process 18
##
## Doing calibration from model 1801 to 1900 in process 19
##
## Doing calibration from model 1901 to 2000 in process 20
##
## Doing calibration from model 2001 to 2100 in process 21
##
## Doing calibration from model 2101 to 2200 in process 22
##
## Doing calibration from model 2201 to 2300 in process 23
##
## Doing calibration from model 2301 to 2400 in process 24
##
## Doing calibration from model 2401 to 2500 in process 25
##
## Doing calibration from model 2501 to 2600 in process 26
##
## Doing calibration from model 2601 to 2700 in process 27
##
## Doing calibration from model 2701 to 2800 in process 28
##
## Doing calibration from model 2801 to 2900 in process 29
##
## Doing calibration from model 2901 to 3000 in process 30
##
## Doing calibration from model 3001 to 3100 in process 31
##
## Doing calibration from model 3101 to 3200 in process 32
##
## Doing calibration from model 3201 to 3300 in process 33
##
## Doing calibration from model 3301 to 3400 in process 34
##
## Doing calibration from model 3401 to 3500 in process 35
##
## Doing calibration from model 3501 to 3600 in process 36
##
## Doing calibration from model 3601 to 3700 in process 37
##
## Doing calibration from model 3701 to 3800 in process 38
##
## Doing calibration from model 3801 to 3900 in process 39
##
## Doing calibration from model 3901 to 4000 in process 40
##
## Doing calibration from model 4001 to 4100 in process 41
##
## Doing calibration from model 4101 to 4200 in process 42
##
## Doing calibration from model 4201 to 4300 in process 43
##
## Doing calibration from model 4301 to 4400 in process 44
##
## Doing calibration from model 4401 to 4500 in process 45
##
## Doing calibration from model 4501 to 4600 in process 46
##
## Doing calibration from model 4601 to 4700 in process 47
##
## Doing calibration from model 4701 to 4800 in process 48
##
## Doing calibration from model 4801 to 4823 in process 49
##
## Finishing calibration of models 3801 to 3900
##
## Finishing calibration of models 3901 to 4000
##
## Finishing calibration of models 4001 to 4100
##
## Finishing calibration of models 4101 to 4200
##
## Finishing calibration of models 4201 to 4300
##
## Finishing calibration of models 4301 to 4400
##
## Finishing calibration of models 4401 to 4500
##
## Finishing calibration of models 4501 to 4600
##
## Finishing calibration of models 4601 to 4700
##
## Finishing calibration of models 4701 to 4800
##
## Finishing calibration of models 4801 to 4823
##
## Finishing calibration of models 1 to 100
##
## Finishing calibration of models 101 to 200
##
## Finishing calibration of models 201 to 300
##
## Finishing calibration of models 301 to 400
##
## Finishing calibration of models 401 to 500
##
## Finishing calibration of models 501 to 600
##
## Finishing calibration of models 601 to 700
##
## Finishing calibration of models 701 to 800
##
## Finishing calibration of models 901 to 1000
##
## Finishing calibration of models 801 to 900
##
## Finishing calibration of models 1001 to 1100
##
## Finishing calibration of models 1101 to 1200
##
## Finishing calibration of models 1201 to 1300
##
## Finishing calibration of models 1301 to 1400
##
## Finishing calibration of models 1401 to 1500
##
## Finishing calibration of models 1501 to 1600
##
## Finishing calibration of models 1601 to 1700
##
## Finishing calibration of models 1701 to 1800
##
## Finishing calibration of models 1801 to 1900
##
## Finishing calibration of models 1901 to 2000
##
## Finishing calibration of models 2001 to 2100
##
## Finishing calibration of models 2101 to 2200
##
## Finishing calibration of models 2201 to 2300
##
## Finishing calibration of models 2301 to 2400
##
## Finishing calibration of models 2401 to 2500
##
## Finishing calibration of models 2501 to 2600
##
## Finishing calibration of models 2601 to 2700
##
## Finishing calibration of models 2701 to 2800
##
## Finishing calibration of models 2801 to 2900
##
## Finishing calibration of models 2901 to 3000
##
## Finishing calibration of models 3001 to 3100
##
## Finishing calibration of models 3101 to 3200
##
## Finishing calibration of models 3201 to 3300
##
## Finishing calibration of models 3301 to 3400
##
## Finishing calibration of models 3401 to 3500
##
## Finishing calibration of models 3501 to 3600
##
## Finishing calibration of models 3601 to 3700
##
## Finishing calibration of models 3701 to 3800
##
## Finishing...
##
## -----------------------------------------------------------------------------------------
## 246 models passed omr_criteria for train data
## 52 models passed omr_criteria for test data
## 13 models passed omr_criteria for train and test data
The elapsed time in minutes
## user system elapsed
## 0.24813333 0.01436667 40.67981667
Now we show the results for 10 best models. The table contains eleven fields:
- fitted_vars: The fitted variables
- nvars: Number of variables used to fit the ellipsoid model
- om_rate_train: Omission rate of training data
- om_rate_test: Omission rate of testing data
- bg_prevalence: The estimated prevalence of the species in background data
- pval_bin: The p-value of the binomial test (see (Peterson, Papes, and Soberon 2008)) performed in environmental space
- pval_proc: The p-value of the partial ROC test performed in environmental space
- env_bg_aucratio: Environmental background AUC ratio
- mean_omr_train_test: Mean omission rate of testing and training data
- rank_by_omr_train_test: The rank of the models given testing and training data
| fitted_vars | nvars | om_rate_train | om_rate_test | bg_prevalence | pval_bin | pval_proc | env_bg_aucratio | mean_omr_train_test | rank_by_omr_train_test | rank_omr_aucratio |
|---|---|---|---|---|---|---|---|---|---|---|
| bio4,bio9,bio13 | 3 | 0.0379747 | 0.0427807 | 0.3023024 | 0 | 0 | 1.524967 | 0.0403777 | 13 | 1 |
| bio4,bio6,bio12,bio19 | 4 | 0.0126582 | 0.0481283 | 0.2999061 | 0 | 0 | 1.517946 | 0.0303933 | 1 | 2 |
| bio4,bio6,bio12 | 3 | 0.0379747 | 0.0481283 | 0.3099902 | 0 | 0 | 1.515569 | 0.0430515 | 23 | 3 |
| bio4,bio7,bio12 | 3 | 0.0379747 | 0.0481283 | 0.3250265 | 0 | 0 | 1.509778 | 0.0430515 | 24 | 4 |
| bio6,bio8,bio13,bio19 | 4 | 0.0253165 | 0.0481283 | 0.3154216 | 0 | 0 | 1.502200 | 0.0367224 | 4 | 5 |
| bio7,bio8,bio13,bio19 | 4 | 0.0253165 | 0.0481283 | 0.3247269 | 0 | 0 | 1.491501 | 0.0367224 | 5 | 6 |
| bio1,bio6,bio13,bio19 | 4 | 0.0253165 | 0.0481283 | 0.3316760 | 0 | 0 | 1.475350 | 0.0367224 | 6 | 7 |
| bio7,bio13,bio19 | 3 | 0.0379747 | 0.0427807 | 0.3759660 | 0 | 0 | 1.445673 | 0.0403777 | 14 | 8 |
| bio3,bio7,bio12,bio18,bio19 | 5 | 0.0253165 | 0.0481283 | 0.3643643 | 0 | 0 | 1.436196 | 0.0367224 | 7 | 9 |
| bio2,bio9,bio13 | 3 | 0.0379747 | 0.0481283 | 0.3907826 | 0 | 0 | 1.422951 | 0.0430515 | 25 | 10 |
7 Project the best model
We will project the best model according to the above table. For another complete example see the help of ?ntbox::ellipsoid_selection.
# Select the model number one in table e_select
bestvarcomb <- stringr::str_split(e_selct$fitted_vars,",")[[1]]Fit the model in environmental space.
best_mod <- ntbox::cov_center(dtrain_env[,bestvarcomb],
mve = T,
level = 0.99,
vars = 1:length(bestvarcomb))mProj <- ntbox::ellipsoidfit(am_env[[bestvarcomb]],
centroid = best_mod$centroid,
covar = best_mod$covariance,
level = 0.99,size = 3)
if(length(bestvarcomb)==3)
rgl::rglwidget()Project the model in geographic space
References
Cobos, Marlon E., A. Townsend Peterson, Narayani Barve, and Luis Osorio-Olvera. 2019. “kuenm: an R package for detailed development of ecological niche models using Maxent.” PeerJ 7 (February): e6281. https://doi.org/10.7717/peerj.6281.
Hijmans, Robert J. 2017. “raster: Geographic analysis and modeling with raster data.” http://cran.r-project.org/package=raster.
Owens, H, L Campbell, L Dornak, E Saupe, N Barve, J Soberon, K Ingenloff, A Lira-Noriega, C Hensz, and C Myers. 2012. “Constraints on Interpretation of Ecological Niche Models by Limited Environmental Ranges on Calibration Areas: Software Script Appendix.”
Peterson, A. Townsend, Monica Papes, and Jorge Soberon. 2008. “Rethinking receiver operating characteristic analysis applications in ecological niche modeling.” Ecol. Modell. 213 (1): 63–72.
Sanchez, O., M. A. Pineda, H. Benitez., B. González, and H. Berlanga. 1998. Guía de identificación para las aves y mamíferos silvestres de mayor comercio en México protegidos por la C.I.T.E.S. Edited by SEMARNAP and CONABIO. México.
Van Aelst, Stefan, and Peter Rousseeuw. 2009. “Minimum volume ellipsoid.” Wiley Interdiscip. Rev. Comput. Stat. 1 (1): 71–82. https://doi.org/10.1002/wics.19.